Seamless Integration of Parallelism and Memory Hierarchy
نویسندگان
چکیده
We prove an analogue of Brent’s lemma for BSP-like parallel machines featuring a hierarchical structure for both the interconnection and the memory. Specifically, for these machines we present a uniform scheme to simulate any computation designed for v processors on a v0-processor configuration with v0 v and the same overall memory size. For a wide class of computations the simulation exhibits optimal O (v=v0) slowdown. The simulation strategy aims at translating communication locality into temporal locality. As an important special case (v0 = 1), our simulation can be employed to obtain efficient hierarchyconscious sequential algorithms from efficient fine-grained ones.
منابع مشابه
Increased Acetate Ester Production of Polyploid Industrial Brewer’s Yeast Strains via Precise and Seamless “Self-cloning” Integration Strategy
Background: Enhancing the industrial yeast strains ethyl acetate yield through a precise and seamless genetic manipulation strategy without any extraneous DNA sequences is an essential requisite and significant demand. Objectives: For increasing the ethyl acetate yield of industrial brewer’s yeast strain, all the ATF1 alleles were overexpressed t...
متن کاملStreams: Emerging from a Shared Memory Model
To date OpenMP has been considered the work horse for data parallelism and more recently task level parallelism. The model has been one of shared memory working in parallel on arrays of a uniform nature, but many applications do not meet these often restrictive access patterns. With the development of accelerators on the one hand and moving beyond the node to the cluster on the other, OpenMP’s ...
متن کاملImproving Multi-Application Concurrency Support Within the GPU Memory System
GPUs exploit a high degree of thread-level parallelism to efficiently hide long-latency stalls. Thanks to their latencyhiding abilities and continued improvements in programmability, GPUs are becoming a more essential computational resource. Due to the heterogeneous compute requirements of different applications, there is a growing need to share the GPU across multiple applications in large-sca...
متن کاملA survey of memory architecture for 3D chip multi-processors
3D chip multi-processors (3D CMPs) combine the advantages of 3D integration and the parallelism of CMPs, which are emerging as active research topics in VLSI and multi-core computer architecture communities. One significant potentiality of 3D CMPs is to exploit the diversity of integration processes and high volume of vertical TSV bandwidth to mitigate the well-known “Memory Wall” problem. Mean...
متن کاملFlexible Parallel Processing in Memory: Architecture + Programming Model
VLSI technology continues to develop at a staggering rate presenting two challenges to computer designers: (i) how to capitalize on the additional resources that are available on a chip; and (ii) how to evolve computer architecture models that are well matched to the signi cantly changed physical parameters of new technology and the expanding needs of applications. One of the chief challenges i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002